A Novel Multi - Viewpoint based Similarity Measure for Document Clustering
نویسندگان
چکیده
Data mining is a process of analyzing data in order to bring about patterns or trends from the data. Many techniques are part of data mining techniques. Other mining techniques such as text mining and web mining also exists. Clustering is one of the most important data mining or text mining algorithm that is used to group similar objects together. In other words, it is used to organize the given objects into some meaningful sub groups that make further analysis on data easier. Clustered groups make search mechanisms easy and reduce the bulk of operations and the computational cost. Clustering methods are classified into data partitioning, hierarchical clustering, data grouping. The aim of this paper is to develop a new method that is used to cluster text documents that have sparse and high dimensional data objects. Like kmeans algorithm, the proposed algorithm work faster and provide consistent, high quality performance in the process of clustering text documents. The proposed similarity measure is based on the multi-viewpoint.
منابع مشابه
Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis
Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...
متن کاملMulti-viewpoint Based Similarity Measure and Optimality Criteria for Document Clustering
All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours i...
متن کاملHierarchical Divisive Clustering with Multi View-Point Based Similarity Measure
All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours i...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013